Skip to content

Conversation

@snbianco
Copy link
Collaborator

@snbianco snbianco commented Feb 17, 2025

Adding the ASDFCutout class as a child of ImageCutout. This one actually does change the existing API in a few ways:

  1. You can pass in multiple files to asdf_cut and ASDFCutout in the same way that you can with fits_cut.
  2. Because of 1, the output_file parameter in asdf_cut is now deprecated.
  3. Instead of returning a Cutout2D object when memory_only=True, ASDFCutout returns asdf.AsdfFile objects by default and HDUList objects if output_format='fits'. (I'm still a little conflicted on this, actually. Would it be more valuable to return the Cutout2D objects? Should we have a parameter so users can choose?)

I don't love that these will change the API, but asdf_cut is still very new and I would guess that there's not a large number of people using it yet. These changes are also kind of unavoidable if we want asdf_cut to be able to handle multiple input files at once. I'd love to hear any thoughts that you might have on this! Once this is merged and released, I'll have to update the Roman cutouts notebook on the science platform.

Some other changes/additions:

  • Because of the new architecture, you can choose to output ASDF cutouts in image format.

Base automatically changed from ASB-30252-FITS-Cutout to main February 17, 2025 19:00
@snbianco snbianco marked this pull request as ready for review February 18, 2025 00:43
@snbianco snbianco changed the title Asdf cutout Add ASDFCutout to General Architecture Feb 19, 2025
AlexReedy
AlexReedy previously approved these changes Feb 19, 2025
Copy link
Collaborator

@AlexReedy AlexReedy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking good! It's evolved a bit from the changes in asdf-generalization but I do think it's a better direction. Also as far as changing the tests, I do also think that is necessary as the underlying architecture is changing as well and should be captured.

I also agree with the necessity of changing the API in general, with the general revamping of astrocut I don't really see changes being avoidable.

Copy link
Contributor

@havok2063 havok2063 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking good. I think it's fine to change the API now. This code is still in development, and not being used much. Better to make all these changes now before Roman comes online and we get users/tools actively using this.

Regarding point 3, I think it's useful to be able to access the Cutout2D object. I find it useful for inspection and troubleshooting. I do think the return objects from the user-level cutout classes, like FITSCutout, ASDFCutout, should be consistent with each other. Cutout2D doesn't need to be a default user output though. Maybe we can store the cutout as an attribute, and document how to access it.

Along those same lines, we should consider any other items we want to make attributes or properties to make accessible to the user. I think there's two workflows to consider: one where users just make a cutout and get the output file in one go, and one where users make a cutout, but want to interact with it within the same session.

Comment on lines +115 to +143
def _get_cloud_http(self, input_file: Union[str, S3Path]) -> str:
"""
Get the HTTP URL of a cloud resource from an S3 URI.
Parameters
----------
input_file : str | S3Path
The input file S3 URI.
Copy link
Contributor

@havok2063 havok2063 Feb 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we also have other missions in the cloud, JWST, HST, TESS, we may want to consider moving this into ImageCutout, so we can eventually take advantage of cloud cutouts of those missions, or fits files.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FITSCutout is actually already cloud-enabled. It is a bit awkward, but the reason that _get_cloud_http is here is because the asdf.open function can't handle S3 URIs in the same way that fits.open can.

Comment on lines 156 to 160
The FITS WCS of the image. This is approximated from the GWCS object.
gwcs : `~gwcs.wcs.WCS`
The GWCS of the image.
pixel_coords : tuple
The pixel coordinates closest to the center of the cutout.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this Return needs updating to match what is returned, or vice versa.

Comment on lines +198 to +217
img_cutout = Cutout2D(data,
position=pixel_coords,
wcs=wcs,
size=(self._cutout_size[1], self._cutout_size[0]),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the plan again for using Cutout2D as the main cutout method, in FITSCutout or ImageCutout? It would be nice if we could consolidate the cutout functionality and have a consistent approach across filetypes, missions. Are there roadblocks to this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the ultimate goal is to have Cutout2D be the main method in ImageCutout! The issue with FITSCutout is that astropy couldn't handle HDUList.section objects when creating partial cutouts. Using the section attribute is a huge speed-up, so I elected to prioritize that over using Cutout2D. I put in a PR to fix this that was recently merged, but I'm not sure when the next astropy release will be.

Comment on lines +41 to +43
def asdf_cut(input_files: List[Union[str, Path, S3Path]],
ra: float,
dec: float,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similar to the other xxx_cut functions, can we add a sentence to the docstring that says we're retaining this for backwards compatibility?

@scfleming
Copy link
Collaborator

Just make sure when we merge things into main, we can't break the existing tutorial we delivered for Roman asdf cutouts, or we need to modify the tutorial to work with any changes that impact it. Any previously delivered requirement may be tested with every Build.

@snbianco
Copy link
Collaborator Author

Just make sure when we merge things into main, we can't break the existing tutorial we delivered for Roman asdf cutouts, or we need to modify the tutorial to work with any changes that impact it. Any previously delivered requirement may be tested with every Build.

I'll create a draft MR with roman_notebooks that we can mark as ready when this is merged in.

@scfleming
Copy link
Collaborator

Thanks. Does the Roman Nexus pin on a particular astrocut version? If it does, we could also tell them not to update the version of astrocut in Nexus until we know the notebook tutorial is updated to support any backwards-incompatible changes to whatever the newer astrocut version is. I think long-term when the mission is live that's not a great way to go, since ideally the Nexus is always making the latest stable versions of packages available as quickly as possible to users, but while things need to be stable for testing prior to launch it might be useful / already being done?

@snbianco
Copy link
Collaborator Author

I would assume that the Roman Nexus does have specific versions pinned in the same way that TIKE does. I'm not sure how to find them though, and I don't see anything in the roman_notebooks repository. I'll ask around!

@snbianco
Copy link
Collaborator Author

While addressing Brian's comments, I've run into another design choice that I'm conflicted about.

Previously, fits_cut returns a list of file paths pointing to the cutouts. Because of this, I designed FITSCutout to return a list of filepaths if memory_only=False, and a list of memory objects if memory_only=True. However, the current version of asdf_cut always returns a Cutout2D object, regardless of whether the cutout file is written or not.

I personally prefer asdf_cut's method of always returning the memory objects rather than a list of file paths. However, I do want FITSCutout and ASDFCutout to be consistent with each other, and changing what fits_cut returns will be problematic.

I could add another parameter, something like return_paths, to control what is returned when files are written to disk. I'm still not sure what the default behavior should be though. Does anyone have thoughts?

@scfleming
Copy link
Collaborator

In what ways would changing fits_cut return to work the same way as asdf_cut be problematic? Not knowing the codebase, would there be a path forward where, with extra work, fits_cut always returns a list of memory objects too, and then uses (the same, similar) function to choose to write it to a file at a particular location like presumably asdf_cut does now? Is fits_cut particularly user-facing, or would changing what it returns mostly require internal reconfiguring?

@snbianco
Copy link
Collaborator Author

fits_cut is a user-facing function, and there wasn't an option to return memory objects before this re-org, so changing the default return value would have a large impact on any workflows that use it. fits_cut and asdf_cut are being maintained so that we have backwards compatibility, and we do note in the code (and will in the documentation) that we recommend using FITSCutout and ASDFCutout. Something I could do is add a return_paths parameter to FITSCutout with a default of False (so that it returns memory objects), but set the parameter to True when we call FITSCutout in fits_cut (so that paths are returned by default for that function only).

@havok2063
Copy link
Contributor

I've given this some more thought. Here are my two cents. IMO, as a user, the way that I'd want to work with FITSCut or ASDFCut is to be able to instantiate these objects, and then interact with them, i.e. create, inspect, and display the cutout, then write it out to a file of my choice. To support that, I would expose the _write_xxx methods to the user (make them public methods), expose the Cutout2D object as an attribute (and any other attributes that may be useful), and add new methods for quick inspection and display of the cutout overlay, wcs, etc. I would want to do things like quickly plot the original image and cutout overlay, plot just the cutout with wcs, inspect the wcs object, overlay the center target or other coordinates, run some validator that the cutout is correct, stitch cutouts together.

I also would want a convenience function that combines all those steps into a single function that I could run easily, for when I trust the code or don't care. It creates the cutout, saves the file, and returns me the filepath on disk. This is what the xxx_cut functions could be. They could retain the existing functionality of only returning the filepath(s). In this case we would change asdf_cut slightly. These functions could be loaded with all the user option kwargs to make choices and the conditional logic would live here, rather than pushed into the .cutout() method. Something of the sort...

def fits_cut(...):
    ff = FITSCut(..)
    cutout = ff.make_cutout(...).  # also sets ff.cutout 
    if as_fits:
       return ff.write_as_fits()
    elif as_img:
       return ff.write_as_img()
    elif memory_only:
       return cutout
    elif ....
    else:
      return ff.write_as_fits()

This could create some potential repeated boilerplate code between fits_cut, img_cut, and asdf_cut, so there may be a better way. What do you think of this approach? Or other thoughts on this?

@snbianco
Copy link
Collaborator Author

snbianco commented Mar 4, 2025

I love this idea! It took me a while to change things around, but I got something working that majorly decreases the complexity of these classes by moving the logic for writing files into separate functions (write_as_image, write_as_asdf, and write_as_fits).

Added attributes for ASDFCutout:

  • cutouts_by_file: dict of Cutout2D objects stored by input filepath
  • cutouts: list of Cutout2D objects
  • asdf_cutouts: list of asdf.AsdfFile objects
    • "Lazy loaded" so this attribute isn't assigned until called
  • fits_cutouts: list of fits.HDUList objects
    • "Lazy loaded" so this attribute isn't assigned until called

Added attributes for FITSCutout:

  • cutouts_by_file: dict of numpy arrays stored by input filepath
    • Eventually, I want the values here to be Cutout2D objects like in ASDFCutout
  • hdu_cutouts_by_file: dict of ImageHDU objects stored by input filepath
  • fits_cutouts: list of fits.HDUList objects
    • "Lazy loaded" so this attribute isn't assigned until called

For ImageCutout, I added an image_cutouts property which calls the function get_image_cutouts that accepts the normalization parameters as input and returns the image cutouts as memory objects.

You had some great ideas for other features like plotting, exposing a WCS, etc. I'm still thinking about how features like these will work best given that the classes take multiple input files and can generate a lot of different cutouts at once. I could save the WCS for every cutout, but that seems like a waste of resources. I could also add these things as separate functions, but require a single cutout to be passed in as input? I'm not sure.

@snbianco
Copy link
Collaborator Author

snbianco commented Mar 4, 2025

I think I came up with a pretty decent solution! I added a nested class, CutoutInstance to FITSCutout that "mimics" the attributes of Cutout2D. This eliminates some of the complexity in handling different types of objects between the cutout classes. Eventually, when Cutout2D is updated and a new version of Astropy is released, CutoutInstance can be replaced with Cutout2D.

FITSCutout.cutouts_by_file is a dictionary where the keys are input filenames and the values are lists of CutoutInstance objects. Using this attribute, users can access individual cutout characteristics like data, WCS, and cutout slices.

Another change I made is how FITSCutout handles empty cutouts when single_outfile=True. In the current implementation, ImageHDUs with empty data are still added to the output HDUList. I think it's more performant and user-friendly to skip these HDUs and just make the output file with the files and extensions that have cutout data. I added a warning message that logs when an input file has no cutout data and will be skipped. However, this is a change to how the fits_cut function currently behaves, and in rare cases, it could break a user's code. I'm open to any thoughts on this! If we do go through with the change, I will make sure that the documentation reflects it.

@snbianco snbianco requested a review from havok2063 March 5, 2025 15:19
@havok2063
Copy link
Contributor

havok2063 commented Mar 7, 2025

I'll respond briefly to your comments here, and then do a full PR review in a bit.

You had some great ideas for other features like plotting, exposing a WCS, etc. I'm still thinking about how features like these will work best given that the classes take multiple input files and can generate a lot of different cutouts at once. I could save the WCS for every cutout, but that seems like a waste of resources. I could also add these things as separate functions, but require a single cutout to be passed in as input? I'm not sure.

Yeah I'm not sure if we need to have all class methods support batch mode. I imagine using the classes themselves mostly on a per image/cutout basis for inspection, where each method works on a single cutout.

I hadn't fully appreciated nor thought about only working in a batch mode. We may want to brainstorm a bit more on some workflows. I feel most use cases of batching cutouts would be to run them via the convenience functions to return to later, and less via the interactive class instance itself. One scenario might be a user has a bunch of images they want to make cutouts from. They instantiate one in FITSCut to test and inspect the cutout, check settings etc. They then run the whole set in batch mode. Later they want to spot check some of the cutout files. It would be great if they could reload a cutout file via FITSCut after-the-fact to inspect and display.

This may change the scope of what Astrocut should do though. If they don't already, I think the cutout files should contain metadata about the original source file and the cutout info in its header. In principle then one could do something like ff= FITSCut.from_cutout('path/to/cutout_file.fits') to reload it into ff.cutout then use the class methods to inspect it. It could even reload the original source file if available. But this workflow may require some more thought. All this would be for a future PR.

Copy link
Contributor

@havok2063 havok2063 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me.

@snbianco snbianco merged commit 8a081f1 into main Mar 9, 2025
8 checks passed
@snbianco snbianco deleted the ASDF-Cutout branch March 9, 2025 18:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants